How Good Was That Punt Return? Introducing Return Yards Over Expected (RYOE)
Kapwing
How good was that punt return?
At first glance, it’s obvious that Isaiah McKenzie’s 84-yard touchdown return was a phenomenal individual effort. However, is it possible to quantify what made the return special? What would an average punt returner have achieved in the same set of circumstances?
The goal of this analysis is to create and apply a new metric, Return Yards over Expected (RYOE) to, to do just that. This new metric can be used to evaluate punt returns at both the individual and team levels! Sections II and III contain analysis and insights gleaned from the new metric, while Section IV and the Appendix contain more technical information about how I created the RYOE metric.
Overview of Results
The EYOE metric can bring nuance to the discussion around punt returner and coverage team performance!
Individual Performance
For example, while some players might rank highly in the traditional metrics of “average yards per punt return”, those same players might actually be underperforming relative to the punt return situations that they face.
At the same time, some players might not habitually break off punt returns for high yardage, but might perform better than expected consistently, thus providing a sort of measure of dependability.
Team Performance
According to the RYOE metric, the New England Patriots were the best performing unit in both punt returns and punt coverage.
II. Analysis - 2020 Season
Comparison with/Improvements Upon Current Metrics
Currently, many punt returner rankings are based on yards per return or total. However, this metric is lacking in at least the following ways:
- Doesn’t account for the unique conditions faced on each punt.
- Can be highly skewed by one or a few long returns
- Doesn’t convey any notion of reliability
Using RYOE, I was able to create four season-long individual metrics to answer the following questions:
- Given the punt situations they faced, which punt returner overperforms expectations by the most?
- Average RYOE per punt faced
- Regardless of specific yardage they gain, which punt returner overperforms expectations the most often?
- % of punt return situations resulting in a POSITIVE RYOE
- Regardless of specific yardage they gain, which punt returner underperforms expectations the most often?
- % of punt return situations resulting in a NEGATIVE RYOE
- Which punter returns the most punts when the situation calls for a fair catch?
- % of punts returned when the ERY is 0
The interactive table below contains these metrics, along with other traditional metrics for all returners who faced more than 5 punts:
This table can tell us a lot about different punt returners. With the caveat that some returners only face a few punts in the season, here are just a few insights:
Diamonds in the Rough
When ranked by the traditional measure of average yards per return, Buccaneers returner Kenjon Barner comes in at a paltry 45th. However, he ranks a respectable 12th in terms of RYOE and a phenomenal 2nd in % of punts with a positive RYOE (after only Dede Westbrook, who only faced 5 punts).
This suggests that, although Barner might not have broken off many large returns, he was consistently able to grind a few extra yards in punt returnss. In a “game of inches” like football, this can make be all the difference.
Beyond Average Yardage
The top 5 players in terms of avg. yards per punt return were also the top 5 in terms of avg. RYOE. However, Danny Amendola and Jamal Agnew (6th and 7th in avg. yards per punt return) both had a negative average RYOE.
This suggests that, on average, Amendola and Agnew’s high average punt return yardage could partially be a product of the return situations that they’ve faced, rather than individual performance. In fact, because both players had more negative RYOE returns than positive or neutral returns, it seems likely that both players slightly underperformed relative to what might be expected from an average player facing the same situations.
Why did you return that? Measuring “Decision-making” Skills
How often does a returner ignore the option to fair catch and return a punt that they shouldn’t? Dallas punt returner CeeDee Lamb was in this category, choosing to return a punted with an ERY value of 0 yards 50% of the time.
III. Team Performance
Potential Applications: In the same way that individual punt returners can over/underperform expectations, so too can punt coverage units!
So, which teams cover well? Which return well? Which do both well?
The graph below shows the average yards OVER expected (RYOE) per punt for each team’s punt return unit on the x-axis, with the average yards UNDER expected per punt for each team’s coverage unit on the y-axis. Teams in the upper right quadrant excel in both phases, while teams in the lower left quadrant struggle in both.
Notably, the Patriots appear to far exceed any other team in both metrics. In contrast, the Packers and the Rams appear to have difficulty in both punt coverage and punt returns.
IV. Creating the RYOE Metric:
Data Overview and Considerations
I created the RYOE by using the combined predictions from a random forest and XGBoost regression model to predict punt return yardage outcomes, and compare those predictions with the actual outcomes.
Each model was trained on NFL tracking data and PFF play data from the 2018 and 2019 seasons, and then tested on a holdout dataset consisting of data from the 2020 season.
Random Forest and XGBoost Models
Information on the feature selection and tuning processes will be posted in the future in the project’s GitHub repository. ##### Features
The plot below exhibits the importance of each feature for both models. Features in gray exhibited less importance than a feature of random numbers, and were therefore considered unimportant. (See Appendix for full list and definition of variables)
Prediction and Model Performance
I created ERY by ensembling (in this case by averaging) the predicted number of kick return yards from each model. The performance of the ensemble model on the 2020 holdout dataset was as follows, with the performance of each of its component models as reference:
| Model | RMSE | MAE | MAPE |
|---|---|---|---|
| Ensemble | 0.22 | 3.48 | 0.23 |
| RF | 0.24 | 3.68 | 0.24 |
| XGB | 0.68 | 3.41 | 0.23 |
Although the ensemble model fared worse than the XGBoost in terms of MAE and MAPE, it was the most consistent of the three models across all three metrics.
III. Conclusion
By combining Voronoi diagrams with the dynamic understanding of field control proposed by Fernandez and Bornn, and by combining the signal-capturing properties of different types of models in an ensemble model, the RYOE metric will allow NFL teams and enthusiasts alike to understand one of the least understood aspects of the game with more nuance.
Thank you for reading this brief post. I would like to thank all of the faculty and staff at NC State’s Institute for Advanced Analytics for the instruction and overall support as I pursue a Master’s in Analytics. I would also like to thank Michael Lopez, Javier Fernandez, Luke Bornn, and countless of nobly-minded individuals on StackOverflow for their inspiration and troubleshooting acumen.
All the relevant code and analysis for this project can be found on GitHub.
IV. Appendix
Data Considerations - Returnable Punts & Penalties
The models were trained and tested on data that included any punt that was conceivably returnable or advanceable at any point. This included all punts that landed (or could have landed) within the field of play, and excluded only punts that landed directly out of bounds.
The training and test data also excluded any plays in which the return team committed a penalty, in order to avoid inflated kick return or expected kick return values.
Individual Influence and Field Control
In their 2018 paper, “Wide Open Spaces”, Fernandez and Bornn proposed a measure of individual influence on a part of the soccer pitch that could account for a player’s orientation, velocity, and distance to the ball. Quoting from that same paper:
“Specifically, the player’s influence I at a given location p for a given player i at time t is defined by a bivariate normal distribution with mean µi(t) and covariance matrix Σi(t), given the player’s velocity s and angle θ. For a given location in space p at time t, the probability density function of player i influence area is defined by a standard multivariate normal distribution.”
A team’s field control over a given point on the field is then defined as the following, again citing the authors’ 2018 paper,
where I is an individual player’s influence and i and j are the two respective teams. In this case, I adapted the formula so that i would represent the return team and j would represent the coverage team.
List of Features
- Area of Voronoi area around the ball - Defined earlier. The Voronoi diagram calculation excluded players on the return team
- Mean team field control within the ball’s Voronoi area
- 75th percentile of team field control within the ball’s Voronoi area
- Percent of 1x1 yard squares within Voronoi area with a field control over a certain threshold
- Standard deviation of team field control within the ball’s Voronoi area
- Number of punt rushers
- Number of punt safeties
- Time of snap and punt
- Distance of returner to ball at the end of the punt (return or hitting the ground)
- Number of gunners - number of vises
- Punt hangtime
- Type of punt
- Was the ball punted in the “wrong” direction, relative to the coverage formation?
- Was the ball returned in the “wrong” direction, relative to the return formation?
- Punt distance
- Was the snap OK?
Model Tuning
Random Forest
The random forest model was trained in a two-step process. First, using the RandomForest function in R, I identified the point where the number of trees parameter leveled off (near 100). Then, using TuneRF, I identified the mtry with the lowest OOB as 5.
XGBoost
I also tuned the XGBoost model in two steps. First, using 10-fold cross-validation, I identified 6 as the ideal value for nrounds.
Then, using a grid in caret, I found the optimal values for the eta, max_depth, and gamma parameters.
Model Selection
The choice to use an ensemble model was part theoretical, part educational. Given the small sample size available for a sport with only 16 games per season, I thought that each model might be able to capture different aspects of the “signal” from the data, and that an ensemble model might allow me to have “two bites at the apple”, as it were. In addition, having learned about ensemble models in courses at the Institute for Advanced Analytics, I was interested in seeing how an ensemble model performed on the holdout set.
Potential for other models This approach leaves room for further innovation, as NFL team analysts and others might wish to incorporate additional models into the ensemble. Some potential candidates include (but are not limited to) a K-nearest neighbors model, other boosting tree-based models (such as LightBoost), neural networks, or any of many different deep learning models.